Developing similarity matrices for antibody-protein binding interactions

The inventions of AlphaFold and RoseTTAFold are revolutionizing computational protein science due to their abilities to reliably predict protein structures. Their unprecedented successes are due to the parallel consideration of several types of information, one of which is protein sequence similarity information. Sequence homology has been studied for many decades and depends on similarity matrices to define how similar or different protein sequences are to one another. A natural extension of predicting protein structures is predicting the interactions between proteins, but similarity matrices for protein-protein interactions do not exist. This study conducted a mutational analysis of 384 non-redundant antibody–protein antigen complexes to calculate antibody-protein interaction similarity matrices. Every important residue in each antibody and each antigen was mutated to each of the other 19 commonly occurring amino acids and the percentage changes in interaction energies were calculated using three force fields: CHARMM, Amber, and Rosetta. The data were used to construct six interaction similarity matrices, one for antibodies and another for antigens using each force field. The matrices exhibited both commonalities, such as mutations of aromatic and charged residues being the most detrimental, and differences, such as Rosetta predicting mutations of serines to be better tolerated than either Amber or CHARMM. A comparison to nine previously published similarity matrices for protein sequences revealed that the new interaction matrices are more similar to one another than they are to any of the previous matrices. The created similarity matrices can be used in force field specific applications to help guide decisions regarding mutations in protein-protein binding interfaces.


Introduction
After decades of research, the invention of AlphaFold [1] and RoseTTAFold [2] have arguably solved the protein folding problem.Although there are important fringe cases where these methods do not yet succeed [3] and they do not work for some closely related problems [4,5], they can correctly predict the structures of the vast majority of proteins.This will result in an increasing shift in the focus of computational protein research from structure prediction to function prediction.
Protein-protein interactions (PPIs) are crucial for biological systems to function correctly [6][7][8][9].These interactions are complex and are influenced by many factors.Deciphering the details of PPIs requires the use of both physical chemistry analysis and observed interactions in experimentally determined protein complexes [10,11].Protein interfaces have been extensively studied to develop a detailed understanding of the forces and recognition processes at a molecular level [11][12][13][14][15][16][17][18][19].This knowledge can be expanded by accounting for mutations, which can cause proteins' conformations and interface properties to change [20,21].There are available databases of experimentally-measured changes in binding energies between proteins caused by mutations [22][23][24], and several computational methods have been developed to predict these changes [25][26][27][28][29][30][31].A significant idea that has emerged from that prior literature is the importance of hotspot residues to PPIs, where a hotspot is a residue that is disproportionately important to a PPI.
RoseTTAFold simultaneously considers sequence alignments, distances between residues, and three-dimensional coordinates of backbone atoms in parallel tracks [2].Similarly, the "trunk" of AlphaFold's algorithm considers sequence alignment and residue distance information [1].Although there are many factors that contribute to the successes of these programs, an essential component of both algorithms was the inclusion of sequence alignment information, which is based on similarity matrices.Among the oldest similarity matrices are the Point Accepted Mutation (PAM) matrices [32] and the Blocks Substitution Matrices (BLOSUM) [33], both of which have a long history of use in protein sequence alignment.Since the creation of those matrices, many other similarity matrices have been developed [34][35][36][37][38][39][40][41][42][43][44][45].However, all of those matrices are for sequence similarity applications and we were unable to identify previously published similarity matrices for PPIs.Given the ongoing shift in emphasis to protein function prediction research, the importance of and significant prior research about PPIs, the value of similarity matrices for sequence alignment in protein structure prediction, and the lack of PPI similarity matrices, we sought to create similarity matrices for PPIs.Specifically, this work focuses on similarity matrices for PPIs in antibody-protein complexes.
While many similarity matrices have been developed in the last several decades, it is the authors' experiences that the BLOSUM matrices in particular and the PAM matrices to a lesser extent remain the go-to starting points for similarity matrices for many researchers.This may be due to the fact that only PAM and BLOSUM matrices are selectable scoring matrices on the National Institutes of Health's protein BLAST website [46].As such, the PAM and BLOSUM similarity matrices were used as a conceptual reference in the development of matrices in this work.The PAM and BLOSUM matrices were created from analyses of naturally occurring protein sequences.However, two key differences guided the work in this study toward the development of computationally calculated PPI matrices.First, the analyses for PAM and BLOSUM assessed mutation rates at equivalent positions in evolutionarily related protein sequences.While there are established methods to define protein homology and thus equivalence between different positions, to the authors' knowledge there is no established method to establish equivalency between different positions in protein interactions.As the authors were unable to identify diverse datasets of evolutionarily related PPIs to compare to one another, matrices generated from naturally occurring data could not be created.The second difference is that the effect of a mutation on the Gibbs Free Energy of a PPI is quantifiable, allowing for the determination of how a mutation changed the PPI.For these reasons, the choice was made to calculate PPI similarity matrices for the archetypical protein binding interaction, antibodies binding to protein antigens, using the CHARMM [47], Amber [48], and Rosetta [49] molecular mechanics force fields.

Data generation
384 antibody-protein complexes from a non-redundant database [50] were analyzed in this study.The complex from each PDB file was first minimized in CHARMM to add any missing atoms and correct conflicts between the experimental structure and the force field energy potential.The CHARMM top_all22_prot_cmap.inp topology and par_all22_prot_gbsw.inp parameter files were used for all calculations, along with the Fast Analytical Continuum Treatment of Solvation [51].Each CHARMM minimized complex was subsequently minimized in Amber and Rosetta.The AMBER ff14SB force field [52] was used to perform the energy minimizations and calculations, with the implicit Generalized Born solvation model with surface area used with flags igb = 2 and gbsa = 1.The REF15 parameterization of Rosetta [49] was used for all calculations in Rosetta.Structures were minimized in CHARMM first as we found it was better able to resolve conflicts in the experimental structures than Amber and Rosetta.
The wild-type interaction energy of each complex was calculated according to Eq 1, where IE is the interaction energy of the complex, E complex is the force field-calculated energy of the minimized complex, E Ab is the energy of the antibody alone from the minimized complex, and E Ag is the energy of the antigen alone from the minimized complex.Prior analysis had revealed that the energy contributions of residues to binding in antibody-protein interfaces follow an exponential decay and that only a few residues contribute most of the binding energy [50].Further, on average the 8 th most-important residue contributes less than 5% of the total binding energy in antibodies and antigens with all three force fields, while the average 7 th most-important residue contributes more than 5% in antibodies in all three force fields and in antigens with Rosetta.Using CHARMM and Amber, the 7 th most important antigen residues contribute an average of 4.9 and 4.7% of the total binding energy, respectively.Therefore, the hotspot residues were defined as the seven residues that contributed the most to binding, with both the antibodies and the antigens having their own hotspots identified.Because the energy potentials of the force fields differ from one another, the hotspot residues in each complex were determined on a force field specific basis.Each hotspot residue in each of the complexes was mutated to each of the other 19 common amino acids.The mutations were carried out by identifying the lowest energy rotamer from a library [53] and then minimizing the energy of the complex using the same protocol as for the experimental structures.The percentage change in interaction energy (PC IE ) for each mutant was then calculated using Eq 2, with IE Mut and IE WT being the interaction energies of the mutant and wild type complexes, respectively, as calculated by Eq 1.
The IE WT values are all negative, which is to be expected as they are closely related to the changes in Gibbs Free Energy upon binding for complexes that are experimentally proven to bind.Mutations that are predicted to improve binding correspond to those with IE Mut values more negative than the IE WT values.Thus, Eq 2 calculates a positive value for mutations that are predicted to improve binding and a negative value for those that are predicted to worsen it.

Matrix calculation
There are 380 types of mutations (i.e., each of the 20 common amino acids to each of the other 19 amino acids) for each force field (i.e., CHARMM, Amber, and Rosetta) and for each protein type (i.e., antibodies and antigens).For each dataset, the median value was selected as the representative value of the mutation from amino acid i to amino acid j.Approximately 70% of the datasets exhibited non-Gaussian behavior due to the presence of extremely detrimental outliers.The presence of these outliers was expected, as it is possible to introduce mutations that cause major, irreconcilable steric clashes.The median is the preferred representation over the mean for datasets with a small number of major outliers [54].
The PAM and BLOSUM similarity matrices for protein structures were developed through different methods, but share several key similarities: they are symmetrical, their values are integers, and they have values for all entries in the matrices, including conserving the current amino acid rather than changing it.The reason the PAM and BLOSUM matrices are symmetrical arises from their comparison of known protein sequences.If protein A has amino acid X 1 and protein B has amino acid X 2 at equivalent positions, then it is equally valid to say that the mutation is X 1 !X 2 as it is to say that the mutation is X 2 !X 1 .Thus, the number of times X 1 mutates to X 2 in a set of protein sequences is identical to the number of times X 2 mutates to X 1 .For the interface mutations being studied here, that is not the case.These mutations have a direction: from an existing complex to a putative complex.A consequence of this is that the effects and scores of mutating X 1 !X 2 may be very different from mutating X 2 !X 1 .
While similarity matrices for interface mutations should not be symmetrical, it is possible to generate versions that share the other features of PAM and BLOSUM.The first step in doing so is to determine appropriate numerical scores for retaining a given amino acid rather than mutating it.In PAM and BLOSUM, the scores were the percentage occurrence of each amino acid, and as a result, each row is summed to one.Here, we chose to have the percentage change in binding energy for each amino acid sum to zero.In other words, the percentage change for retaining a given amino acid was equal to the negative of the sum of all the percentage changes for mutating it.
Eq 3 was used to calculate the scores for the similarity matrices, where S i,j is the score for mutating amino acid i to amino acid j, V i,j is the representative value for mutating (or retaining) the amino acid, and round is the standard function of rounding a decimal number to an integer (e.g.1.49 rounds to 1 while 1.50 rounds to 2).Logarithmic scaling was used in the BLOSUM and PAM matrices, so it was also used here.The absolute value of V i,j was used inside the logarithm to ensure all calculated values were real and the plus one was used to guarantee that all outputs of the logarithm function were greater than or equal to 0. The leading fraction is used to assign the correct sign to the score, with beneficial mutations having positive scores and detrimental ones having negative scores.

Matrix comparisons
Two sets of calculations were conducted to evaluate how similar two different matrices are to one another.The first is an average difference score, as calculated by Eq 4.
Here, D m1,m2 is the average difference between matrices m and n, S i,j,m is the score of mutating amino acid i to amino acid j in matrix m, and S i,j,n is the same in matrix n.The summation in the numerator is divided by 400 as there are 20 × 20 possible mutations when scores for retaining a particular amino acid are included.These average differences represent how much a particular matrix favors (positive) or disfavors (negative) mutations in general compared to another matrix.The second calculation is an error calculation using Eq 5, E m;n ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P 20 i¼1 where E m,n is the square root of the average of the squared difference between the matrices of the scores for making mutations adjusted by the average difference calculated in Eq 4. The use of D m,n in this calculation means the average difference is zero.

Results
Table 1 shows the representative values for mutations to antibody residues as calculated by the CHARMM force field, and the results appear reasonable.The only two amino acids that exhibit a general benefit for mutation are alanine and glycine.Alanine and glycine have the smallest side chains of any amino acid: a methyl group and a hydrogen, respectively.As described in the Methods, the mutations were only made to the seven most important residues in an antibody for binding.If alanine and glycine are important to binding, it is likely because of contributions from their backbone atoms, which every amino acid shares.Thus, it is The table should be read as mutating from the residue at the start of a row to the amino acid at the top of a column.Alanine and glycine were the only two amino acids that showed a general benefit for mutating away from their current amino acid to another.All other amino acids preferred to remain unmutated. https://doi.org/10.1371/journal.pone.0293606.t001 possible for mutations to introduce new, favorable interactions without eliminating a beneficial interaction.The inverse is true, too, that mutations could introduce detrimental interactions, but it appears that on average important alanine and glycine residues in antibodies are mutable.Table 2 is the CHARMM-calculated similarity matrix for mutations in antibodies.The values were calculated from those in Table 1 using Eq 3. Nearly every mutation has a negative score, meaning it would be detrimental to binding.All of the antibodies being analyzed in this study are naturally occurring complexes that have been affinity-matured to bind strongly to their target antigens.It is to be expected that they should have few if any possible beneficial mutations remaining.The most positive mutation score, 2, occurs for both alanine and glycine mutations and is noticeably smaller than the smallest non-mutation score, 5, which occurs for isoleucine, leucine, proline, and valine.
The charged amino acids, arginine, lysine, aspartic acid, and glutamic acid, had average mutation scores near -3, with values ranging between -3.11 for arginine and -2.53 for lysine.The aromatic amino acids, phenylalanine, tryptophan, and tyrosine, had similar average mutation scores, with values of -2.74, -3.05, and -3.05, respectively.Mutations to the polar amino acids, cysteine, asparagine, glutamine, serine, and threonine, were somewhat less detrimental, with average scores between -2.16 for glutamine and -1.79 for cysteine.Cysteine was the only charged, aromatic, or polar amino acid with any favorable mutation scores, which were one for each of histidine, serine, and tryptophan.The nonpolar amino acids, alanine, glycine, isoleucine, leucine, methionine, proline, and valine, exhibited different behavior.As already stated, alanine and glycine were the only two amino acids with negative scores for non- Most mutations are predicted to be detrimental on average, but there is a general trend of mutations becoming less detrimental when considering amino acids by groups going from charged to aromatic to polar to nonpolar.Alanine and glycine were the only amino acids that showed average benefits for being mutated rather than retained in an interface. https://doi.org/10.1371/journal.pone.0293606.t002 mutations.The average mutation for the other amino acids was detrimental, but each of them had at least one mutation that was predicted to be either neutral (i.e., a score of zero) or beneficial.For proline, these were mutations to the other nonpolar amino acids leucine and valine.Each of the other nonpolar amino acids had a neutral or beneficial mutation to an aromatic, charged, or polar amino acid.Table 3 is the similarity matrix for mutations to important antibody residues calculated using Amber.The representative values used to create the matrix are in the S1 Table .Overall, the trends of this similarity matrix are similar to those calculated by CHARMM with individual values that are a small amount more negative.A key difference is that the highest scores for all amino acids, including alanine and glycine, are for not mutating the residue.Amber predicts that mutations to charged residues are most detrimental, with the mutation of any charged amino acid to any non-charged amino acid having a score of -4.Mutations to aromatic amino acids are also consistently disfavored with all mutations having a score of -3 except for phenylalanine to glutamic acid, which has a score of -2.Continuing the similar trends from the CHARMM matrix, mutations of polar amino acids were less disfavored than those of charged or aromatic residues while being more penalized than mutations of nonpolar amino acids.Cysteine, asparagine, serine, and threonine all have average mutation scores between -3.00 for asparagine and -2.79 for cysteine and threonine, which are approximately one point more negative than predicted by CHARMM.Histidine is a small outlier among the polar residues, as it has an average mutation score of -2.16.Histidine is also the first amino acid to have any mutations that are predicted by Amber to typically be beneficial (lysine) or neutral (glutamine).As it did with the other amino acid types, Amber predicted that the typical Table 3.The Amber-calculated similarity matrix for antibody mutations.
The broad trends of the matrix are similar to those calculated by CHARMM in that most mutations are detrimental, mutations of charged amino acids are the most detrimental, and mutations of nonpolar amino acids are the least detrimental.A meaningful difference between Tables 2 and 3 is that Amber predicts mutations to be more detrimental than CHARMM does. https://doi.org/10.1371/journal.pone.0293606.t003 effect of mutations of nonpolar amino acids was more detrimental than CHARMM.However, they remain the least disfavored mutations and every nonpolar amino acid except methionine has at least one mutation to a charged or polar amino acid that has a favorable score.Table 4 is the similarity matrix for mutations to important antibody residues as calculated by Rosetta, with the representative values used to calculate the matrix in the S2 Table .While there are similarities between the Rosetta matrix compared to the CHARMM and Amber matrices, there are also several key differences.Mutations to the charged and aromatic residues remain consistently strongly disfavored, but the trends differ for the polar and nonpolar amino acids.Each of the nonpolar amino acids of isoleucine, leucine, methionine, and valine have only detrimental scores for mutating to any other residue and their average scores are more negative than in the other force fields.Another outlier is cysteine, which Rosetta predicts to be on average the second most unfavorable residue to mutate after tryptophan.In contrast to the other nonpolar amino acids, alanine, glycine, and proline each have at least one amino acid that is predicted to be typically beneficial to mutate the residue into.Notably, for glycine that residue is tryptophan, which represents a change from the smallest amino acid to the largest.Finally, while still detrimental on average, Rosetta's similarity matrix assigns less punitive scores to serine mutations than CHARMM or Amber do.Of the three force fields, Rosetta is the only one to predict that any mutations of serine would typically be beneficial, which it does for phenylalanine, isoleucine, and valine.
The antibodies in the evaluated complexes have undergone an affinity maturation process to improve their binding affinities with their target antigens.In contrast, the bound proteins While all amino acids have positive scores for not mutating, some of the other trends differ compared to the CHARMM and Amber matrices.In particular, mutations of cysteine, isoleucine, leucine, methionine, and valine are more strongly disfavored by Rosetta compared to the other force fields.In contrast, mutations of serine are less disfavored by Rosetta, which is the only force field to predict that there are specific mutations of serine (i.e., to phenylalanine, isoleucine, and valine) that are typically beneficial. https://doi.org/10.1371/journal.pone.0293606.t004 have not been changed to better bind to the antibodies.To explore how evolutionary direction may have impacted the similarity matrices, additional matrices were made for each force field for the antigens.Table 5 is the similarity matrix for the important residues in antigens calculated using CHARMM.The representative values used to calculate the matrix are in the S3 Table .The trends share many similarities with the matrix for the antibody mutations.Mutating the charged amino acids is consistently predicted to be the most detrimental, followed closely by the aromatic residues.Mutations of the polar amino acids are on average less detrimental than those of charged or aromatic residues, but none of the polar amino acids has any individual mutation that is on average predicted to be beneficial and only cysteine has any (to alanine, methionine, and tryptophan) that are predicted to be neutral.In contrast, every nonpolar amino acid except for leucine (three neutral mutations) and methionine (all detrimental mutations) has at least one individual mutation that is predicted to be beneficial.The most meaningful distinction among the nonpolar amino acids in the antigens versus the antibodies is that mutations of glycine are expected to be, on average, detrimental while those to valine are beneficial.On the whole, the impact of mutations on the most important antigen residues is less detrimental than those to antibodies, with the total of all scores in Table 5 being -581 versus the -635 of Table 2. Table 6 is the similarity matrix for the important residues in antigens calculated using Amber from the representative values in the S4 Table .The trends it shows are qualitatively similar to those in Table 3.In particular, mutations of charged amino acids are the worst scoring, mutations of aromatic residues are also predicted to be detrimental, and the scores are on average more negative than the corresponding CHARMM-calculated scores.The calculated The trends compared to the CHARMM matrix for antibodies are very similar.Charged and aromatic residues are the most detrimental to mutate.Polar amino acids are the next most detrimental, while nonpolar residues are the least detrimental to mutate.The total sum of scores in this matrix is 54 points larger than that of the antibody matrix, indicating that antigen mutations are on average better tolerated than antibody mutations.
https://doi.org/10.1371/journal.pone.0293606.t005mutation scores for polar residues are also quite similar to their antibody counterparts.Interestingly, that is not the case for the nonpolar residues which are predicted to have more detrimental impacts on binding than the corresponding antibody mutations.Alanine to lysine, alanine to arginine, and proline to arginine are the only mutations that have positive mutation scores.Unlike the CHARMM matrices, the sum of all scores in this matrix is 53 points more negative than the corresponding antibody matrix (-936 versus -883).The final similarity matrix calculated in this study was for mutations to important residues in antigens using the Rosetta force field.It is shown in Table 7 and the corresponding representative values are in the S5 Table .As was the case for the mutations to important antibody residues shown in Table 4, the trends have some similarities to the CHARMM and Amber data with some striking discrepancies.As has been the case in every similarity matrix, mutations of the charged and aromatic amino acids are strongly disfavored.The average mutation score of cysteine is less negative in the antigens than the antibodies (-2.74 versus -3.53) but is still clearly disfavored.It is comparably disfavored as mutations of glutamine (-2.84) and threonine (-2.74), two of the other polar amino acids.However, asparagine and histidine have average mutation scores of -2.00 and -1.89, respectively, which are less negative than seen by any calculations with Amber or CHARMM and are also less detrimental than any nonpolar residue except glycine in this matrix.Rosetta predicts that mutations to isoleucine, leucine, methionine, and valine are all typically detrimental, while alanine and proline have neutral or favorable mutations, respectively, only to tryptophan.The two amino acids with multiple mutations predicted to be typically beneficial are glycine and serine.Mutations of glycine are typically beneficial when they are changes to alanine, cysteine, phenylalanine, leucine, methionine, valine, and tryptophan and are neutral when they Table 6.The Amber-calculated similarity matrix for antigen mutations.
While the trends in this matrix are very similar to those in Table 3, mutations of important nonpolar residues in antigens have worse scores and thus are typically more detrimental to binding than those for antibodies. https://doi.org/10.1371/journal.pone.0293606.t006 are changes to glutamine or tyrosine.Mutations of serine are typically beneficial when they are changes to cysteine, phenylalanine, methionine, and tyrosine and are neutral when they are changes to leucine, valine, and tryptophan.Notable in those two lists are the preponderance of nonpolar and aromatic amino acids capable of making hydrophobic interactions.Overall, the sum of all scores in this similarity matrix is less detrimental than those in the antibody matrix by 40 points (-819 versus -859), which is similar to the CHARMM results.It is of interest to quantitatively compare the six calculated antibody-protein PPI similarity matrices to one another, as well as to previously published similarity matrices for protein sequence comparisons.Nine such matrices were selected for the comparison: two each of the PAM (i.e., PAM30 and 60) and BLOSUM (i.e., BLOSUM62 and BLOSUM80) families of matrices, the matrix of Miyazawa and Jernigan based on contact frequencies in protein structures [34], the matrix of Qian and Goldstein for generating accurate alignments [41], the matrix of Saigo, Vert and Akutsu for finding distant homologues [42], the matrix of Yamada and Tomii for finding distant homologues [43], and the matrix of Jia and Jernigan for considering pairs of substitutions in densely packed globular proteins [45].
The average difference scores as calculated by Eq 4 are shown in Fig 1 .Values on the diagonal are not shown because they would all be 0 and values above the diagonal are not shown because they are the negatives of the transposed matrix.The values show that the matrices calculated with each force field have low similarity scores.The Rosetta and Amber matrices are also similar to one another, though not with the CHARMM matrices.Almost all of the sequence similarity matrices show large average differences with the PPI matrices calculated in this work.The most notable exception is the similarity between the BLOSUM80 matrix and the CHARMM matrices, which have quite small average differences.
The data share similarities with Table 4.In particular, mutations of isoleucine, leucine, methionine, and valine are more detrimental than in other matrices while mutations of serine, especially to nonpolar residues, are less detrimental. https://doi.org/10.1371/journal.pone.0293606.t007 The error scores calculated with Eq 5 are shown in Fig 2 .The lowest error scores are between the set of the two BLOSUM matrices, the matrix of Saigo, Vert and Akutsu, and the matrix of Jia and Jernigan, suggesting those four matrices share common features.The error scores between PPI matrices calculated in this work by the same force field (e.g., the CHARMM antibody and antigen matrices) are also small compared to the other values in the matrices.The six matrices from this work have relatively low error scores compared to one another, with the maximum score between any of the six matrices being 1.509 between the antigen matrices of CHARMM and Rosetta.In contrast, the minimum score of any of those six matrices with any of the nine sequence similarity matrices is 1.922 between the Rosetta antibodies matrix and the matrix of Saigo, Vert and Akutsu.This indicates that while the six antibody-antigen interaction similarity matrices have differences between one another, they have more in common with one another than they do with previous matrices for protein sequence similarity.

Discussion
The invention of reliable and accurate machine learning-based algorithms for protein structure prediction is transforming computational protein science and engineering.One of the key components of AlphaFold [1] and RoseTTAFold [2] is the extraction of information from similar protein sequences.The alignment of protein sequences is heavily dependent on similarity matrices, including PAM [32] and BLOSUM [33].As machine learning method development shifts towards the prediction and design of PPIs, a possible strategy is to mimic what worked in AlphaFold and RoseTTAFold and include interaction similarity information.
This work described the calculation of antibody-protein interaction similarity matrices using three force fields, CHARMM [47], Amber [48], and Rosetta [49].Each of these force fields has a long history of use in the fields of computational protein science and engineering and has been developed and optimized over the course of decades by hundreds of researchers.At the outset of the study, it was anticipated that the calculated matrices would exhibit similar properties but with force field dependent variations that would be relevant for different projects.Given that AlphaFold uses Amber as part of its final refinement of structures [1] while RosseTTAFold is integrated into the larger suite of Rosetta software, it seems likely that PPI methods will also utilize various force fields.
384 nonredundant complexes of antibodies binding to protein antigens were analyzed in this study.Analyzing antibody-antigen complexes was chosen for several reasons.First, antibodies are the archetypical binding protein.Due to their important and widespread therapeutic and experimental applications, they have been extensively studied in prior experimental and computational literature [50,[55][56][57][58][59].Second, antibodies undergo an affinity maturation process to improve their binding to antigens [60] while the antigens remain unchanged.This provides an opportunity to compare how similarity matrices for PPIs differ for proteins that have mutated to improve the affinity of the interaction to those that have not.
The analyses only considered the hotspot residues in the proteins, defined as the seven residues in each antibody and antigen that contributed the most to the complexes' binding affinities.Prior analyses have demonstrated that the most important residues contribute the significant majority of the binding energies in complexes, with the other residues in the interface making much smaller contributions [50].The interfaces are much larger than the most important residues, with most having ~30 amino acids per protein.The choice was made to focus on only the most important residues to avoid biasing the data with binding energy changes from residues that were less important to the PPI.If these matrices prove to be of use, then future research can explore how they change when other force fields, other types of protein complexes, and additional residues are used in their calculation.
One feature observed while conducting this and prior work was that computationally predicted interaction energies often have significant deviations in magnitude from their experimental values.Further, we observed that the magnitude of the predicted energies increased with the buried surface area of the complexes.While the relative changes of energies for point mutations to complexes are informative, directly comparing the magnitudes was difficult.Therefore, Eq 2 was used to normalize the calculated values.Converting the energies into percentage changes facilitates the comparison of how important a given residue or mutation is between complexes.
The BLOSUM and PAM similarity matrices were the inspiration for this study, and therefore the matrices here were constructed to have similar features.In particular, the scores are all integer values, negative scores are detrimental while positive ones are beneficial, and the scores are logarithmically weighted.However, two key differences exist in the matrices.In PAM and BLOSUM, the scores for not mutating an amino acid are related to the frequency of that event in the protein sequences used to create the matrices.Here, all possible mutations were computationally calculated for each important residue so there was no corresponding natural score to use.Instead, the choice was made to have the sum of each row in the representative value matrices be equal to zero.
The other key difference is that the matrices calculated here are not symmetrical, because mutations can have different impacts on binding depending on their directionality.An example of this from Table 1 is that on average mutating glycine to arginine improved the predicted binding energy by 2.48% while mutating arginine to glycine worsened the predicted binding energy by 9.62%.This is to be expected: when arginine is important in a binding interface it is likely to be part of a salt bridge while glycine's contributions are likely to come from its backbone.Mutating glycine to arginine at that position could still contribute to the backbone interactions while creating the potential for a salt bridge whereas mutating arginine to glycine is much more likely to remove a beneficial interaction.As these effects and magnitudes are not equal, similarity matrices for interface mutations should not be symmetrical.
Several features stand out in analyzing the similarity matrices in Tables 2-7.The most prominent are the consistent penalties for mutating charged and aromatic amino acids.When those residues are important to binding, mutating them resulted in the most negative scores with every force field for both antibodies and antigens.Interestingly, this even stood for mutations within the groups.Mutating an aromatic residue to another aromatic residue or a charged amino acid to the other amino acid with the same charge still had corresponding detrimental scores.This suggests that there are critical features in the interactions involving those residues that cannot easily be replaced even by similar chemical structures.Another commonality is that the scores for mutations between an antibody matrix and the antigen matrix calculated with the same force field were similar.For example, the scores for mutating charged residues were approximately -4 with Amber for both antibodies and antigens versus -3 with CHARMM and Rosetta.
While the CHARMM and Rosetta matrices showed somewhat less detrimental scores overall for the antigens, as was expected prior to calculation because the antigens had not been mutated to have improved binding with the antibodies, that was not the case with the Ambercalculated matrices.The overall trends in the matrices more closely matched the force field used to calculate them than the type of protein being mutated.In particular, CHARMM and Amber assigned more detrimental scores to mutations of polar amino acids than Rosetta did.In contrast, Rosetta had scores that were more punitive to mutating nonpolar residues and were more permissive of mutations to polar amino acids, especially serine.
To provide a more holistic assessment of the matrices compared to one another and to previous sequence similarity matrices, average difference scores and error scores were calculated.These values demonstrate that the matrices calculated by the same force field are very similar to one another.The difference scores between the CHARMM versus the Amber and Rosetta matrices were comparable to values with the sequence similarity matrices.However, the error calculations show that the six newly calculated matrices are more similar to one another than they are to any of the previously published sequence similarity matrices checked in this work.This suggests that there are effects of mutations in protein-protein interfaces that meaningfully differ compared to their effects on protein structures.
The authors do note that there are aspects of the antibody-protein similarity matrices that appear to reflect the historical priorities of the force fields used in this study.CHARMM and Amber share similar histories [47,61], each beginning their development in the 1970s on code originally developed at Harvard for the purpose of studying macromolecules.Inherent in studying a protein is having a starting structure and how that structure interacts with its solvent and other molecules in its system.In contrast, the initial development of Rosetta occurred in the 1990s for the purpose of predicting protein structures [62], and the dominant force in protein folding is the hydrophobic effect [63].The observation that Rosetta disfavors mutations to nonpolar residues more than Amber and CHARMM do is consistent with the intended purposes for which each program was created.Rosetta's predictions that mutations of serine in antibody interfaces to phenylalanine, isoleucine, and valine are typically beneficial also contradict observed experimental trends in antibodies.In particular, antibodies have evolved so that there is an overabundance of serine in their binding surfaces [60].It seems unlikely that mutations of a polar residue, which nature has chosen through evolution to be in antibody binding interfaces, to nonpolar residues should frequently be beneficial.

Conclusions
To the authors' knowledge, this work describes the first similarity matrices for PPIs, specifically for those in antibody-protein complexes.The calculated matrices have several interesting differences between them; however, the average difference and error scores calculated in Figs 1 and 2 demonstrate that the matrices have more in common with one another than they do with previous matrices for sequence similarity.This indicates that there are important differences in how mutations impact protein-protein interactions compared to how they affect protein structures.While it seems likely that the matrices developed in this work inherently reflect the features of the force fields used to calculate them, this should in no way be construed as a criticism of those force fields.They are excellent tools that were developed for specific purposes and have been continually improved upon for decades.They have each demonstrated innumerable times that they produce high-quality predictions of protein structures and properties.Rather, this should be interpreted as the observation that the statistical preferences inherent in their energy functions emerge when analyzing over 100,000 amino acid mutations.That the preferences would emerge was expected and was why three different force fields were used in this study.However, that the preferences would differ to the extent they do, especially for nonpolar amino acids and serine, was not anticipated.Those differences lead the authors to conclude that the similarity matrices created here can be used in force field specific applications, but further work is needed to identify a general PPI similarity matrix with consensus behaviors.

Fig 1 .
Fig 1.Average difference scores for similarity matrices.These values represent how much a matrix favors (positive) or disfavors (negative) mutations relative to another matrix.Numbers are colored based on their absolute values, with smaller magnitude numbers (i.e., for matrices that are more similar to one another) in red and larger magnitude numbers in blue.The antibody-protein PPI matrices calculated from the same force field (e.g., CHARMM antibodies and antigens) each have small difference scores from one another.The most notable remaining similarities are those between BLOSUM80 and the CHARMM matrices and those between the Rosetta and Amber matrices.https://doi.org/10.1371/journal.pone.0293606.g001

Fig 2 .
Fig 2. Error scores between similarity matrices.These values are a measure of how different equivalent mutations are in the different matrices.The values are shown using a heat map to make trends easier to visually identify, with smaller numbers in red and larger values in blue.The matrices calculated by the same force field (e.g. the CHARMM antibody and antigen matrices) have low scores.The error scores for the six matrices calculated in this work indicate similarity between the matrices, with a maximum value of 1.509.In contrast, the minimum error score of those matrices with any other matrix is 1.922.https://doi.org/10.1371/journal.pone.0293606.g002